197 research outputs found
Face-to-face spatial orientation fine-tunes the brain for neurocognitive processing in conversation
We here demonstrate that face-to-face spatial orientation induces a special âsocial modeâ for neurocognitive processing during conversation, even in the absence of visibility. Participants conversed face-to-face, face-to-face but visually occluded, and back-to-back to tease apart effects caused by seeing visual communicative signals and by spatial orientation. Using dual-EEG, we found that 1) listenersâ brains engaged more strongly while conversing in face-to-face than back-to-back, irrespective of the visibility of communicative signals, 2) listeners attended to speech more strongly in a back-to-back compared to a face-to-face spatial orientation without visibility; visual signals further reduced the attention needed; 3) the brains of interlocutors were more in sync in a face-to-face compared to a back-to-back spatial orientation, even when they could not see each other; visual signals further enhanced this pattern. Communicating in face-to-face spatial orientation is thus sufficient to induce a special âsocial modeâ which fine-tunes the brain for neurocognitive processing in conversation
Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): A toolkit for the automatic detection of hand movements and gestures in video data
In human face-to-face communication, speech is frequently accompanied by visual signals, especially communicative hand gestures. Analyzing these visual signals requires detailed manual annotation of video data, which is often a labor-intensive and time-consuming process. To facilitate this process, we here present SPUDNIG (SPeeding Up the Detection of Non-iconic and Iconic Gestures), a tool to automatize the detection and annotation of hand movements in video data. We provide a detailed description of how SPUDNIG detects hand movement initiation and termination, as well as open-source code and a short tutorial on an easy-to-use graphical user interface (GUI) of our tool. We then provide a proof-of-principle and validation of our method by comparing SPUDNIGâs output to manual annotations of gestures by a human coder. While the tool does not entirely eliminate the need of a human coder (e.g., for false positives detection), our results demonstrate that SPUDNIG can detect both iconic and non-iconic gestures with very high accuracy, and could successfully detect all iconic gestures in our validation dataset. Importantly, SPUDNIGâs output can directly be imported into commonly used annotation tools such as ELAN and ANVIL. We therefore believe that SPUDNIG will be highly relevant for researchers studying multimodal communication due to its annotations significantly accelerating the analysis of large video corpora
Degree of language experience modulates visual attention to visible speech and iconic gestures during clear and degraded speech comprehension
Visual information conveyed by iconic hand gestures and visible speech can enhance speech comprehension under adverse listening conditions for both native and nonânative listeners. However, how a listener allocates visual attention to these articulators during speech comprehension is unknown. We used eyeâtracking to investigate whether and how native and highly proficient nonânative listeners of Dutch allocated overt eye gaze to visible speech and gestures during clear and degraded speech comprehension. Participants watched video clips of an actress uttering a clear or degraded (6âband noiseâvocoded) action verb while performing a gesture or not, and were asked to indicate the word they heard in a cuedârecall task. Gestural enhancement was the largest (i.e., a relative reduction in reaction time cost) when speech was degraded for all listeners, but it was stronger for native listeners. Both native and nonânative listeners mostly gazed at the face during comprehension, but nonânative listeners gazed more often at gestures than native listeners. However, only native but not nonânative listeners' gaze allocation to gestures predicted gestural benefit during degraded speech comprehension. We conclude that nonânative listeners might gaze at gesture more as it might be more challenging for nonânative listeners to resolve the degraded auditory cues and couple those cues to phonological information that is conveyed by visible speech. This diminished phonological knowledge might hinder the use of semantic information that is conveyed by gestures for nonânative compared to native listeners. Our results demonstrate that the degree of language experience impacts overt visual attention to visual articulators, resulting in different visual benefits for native versus nonânative listeners
Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual information
During communication in real-life settings, the brain integrates information from auditory and visual modalities to form a unified percept of our environment. In the current magnetoencephalography (MEG) study, we used rapid invisible frequency tagging (RIFT) to generate steady-state evoked fields and investigated the integration of audiovisual information in a semantic context. We presented participants with videos of an actress uttering action verbs (auditory; tagged at 61 Hz) accompanied by a gesture (visual; tagged at 68 Hz, using a projector with a 1440 Hz refresh rate). Integration ease was manipulated by auditory factors (clear/degraded speech) and visual factors (congruent/incongruent gesture). We identified MEG spectral peaks at the individual (61/68 Hz) tagging frequencies. We furthermore observed a peak at the intermodulation frequency of the auditory and visually tagged signals (fvisual â fauditory = 7 Hz), specifically when integration was easiest (i.e., when speech was clear and accompanied by a congruent gesture). This intermodulation peak is a signature of nonlinear audiovisual integration, and was strongest in left inferior frontal gyrus and left temporal regions; areas known to be involved in speech-gesture integration. The enhanced power at the intermodulation frequency thus reflects the ease of integration and demonstrates that speech-gesture information interacts in higher-order language areas. Furthermore, we provide a proof-of-principle of the use of RIFT to study the integration of audiovisual stimuli, in relation to, for instance, semantic context
The Effects of Iconic Gestures and Babble Language on Word Intelligibility in Sentence Context
Purpose:This study investigated to what extent iconic co-speech gestures helpword intelligibility in sentence context in two different linguistic maskers (nativevs. foreign). It was hypothesized that sentence recognition improves with thepresence of iconic co-speech gestures and with foreign compared to nativebabble.Method:Thirty-two native Dutch participants performed a Dutch word recogni-tion task in context in which they were presented with videos in which anactress uttered short Dutch sentences (e.g.,Ze begint te openen,âShe starts toopenâ). Participants were presented with a total of six audiovisual conditions: nobackground noise (i.e., clear condition) without gesture, no background noise withgesture, French babble without gesture, French babble with gesture, Dutch bab-ble without gesture, and Dutch babble with gesture; and they were asked to typedown what was said by the Dutch actress. The accurate identification of theaction verbs at the end of the target sentences was measured.Results:The results demonstrated that performance on the task was better inthe gesture compared to the nongesture conditions (i.e., gesture enhancementeffect). In addition, performance was better in French babble than in Dutchbabble.Conclusions:Listeners benefit from iconic co-speech gestures during commu-nication and from foreign background speech compared to native. Theseinsights into multimodal communication may be valuable to everyone whoengages in multimodal communication and especially to a public who oftenworks in public places where competing speech is present in the background
The predictive potential of hand gestures during conversation: An investigation of the timing of gestures in relation to speech
In face-to-face conversation, recipients might use the bodily movements of the speaker (e.g. gestures) to facilitate language processing. It has been suggested that one way through which this facilitation may happen is prediction. However, for this to be possible, gestures would need to precede speech, and it is unclear whether this is true during natural conversation. In a corpus of Dutch conversations, we annotated hand gestures that represent semantic information and occurred during questions, and the word(s) which corresponded most closely to the gesturally depicted meaning. Thus, we tested whether representational gestures temporally precede their lexical affiliates. Further, to see whether preceding gestures may indeed facilitate language processing, we asked whether the gesture-speech asynchrony predicts the response time to the question the gesture is part of. Gestures and their strokes (most meaningful movement component) indeed preceded the corresponding lexical information, thus demonstrating their predictive potential. However, while questions with gestures got faster responses than questions without, there was no evidence that questions with larger gesture-speech asynchronies get faster responses. These results suggest that gestures indeed have the potential to facilitate predictive language processing, but further analyses on larger datasets are needed to test for links between asynchrony and processing advantages
Embodied space-pitch associations are shaped by language
Height-pitch associations are claimed to be universal and independent of language, but this claim remains controversial. The present study sheds new light on this debate with a multimodal analysis of individual sound and melody descriptions obtained in an interactive communication paradigm with speakers of Dutch and Farsi. The findings reveal that, in contrast to Dutch speakers, Farsi speakers do not use a height-pitch metaphor consistently in speech. Both Dutch and Farsi speakersâ co-speech gestures did reveal a mapping of higher pitches to higher space and lower pitches to lower space, and this gesture space-pitch mapping tended to co-occur with corresponding spatial words (high-low). However, this mapping was much weaker in Farsi speakers than Dutch speakers. This suggests that cross-linguistic differences shape the conceptualization of pitch and further calls into question the universality of height-pitch associations
Speakers exhibit a multimodal Lombard effect in noise
In everyday conversation, we are often challenged with communicating in non-ideal settings, such as in noise. Increased speech intensity and larger mouth movements are used to overcome noise in constrained settings (the Lombard effect). How we adapt to noise in face-to-face interaction, the natural environment of human language use, where manual gestures are ubiquitous, is currently unknown. We asked Dutch adults to wear headphones with varying levels of multi-talker babble while attempting to communicate action verbs to one another. Using quantitative motion capture and acoustic analyses, we found that (1) noise is associated with increased speech intensity and enhanced gesture kinematics and mouth movements, and (2) acoustic modulation only occurs when gestures are not present, while kinematic modulation occurs regardless of co-occurring speech. Thus, in face-to-face encounters the Lombard effect is not constrained to speech but is a multimodal phenomenon where the visual channel carries most of the communicative burden
- âŠ